You are viewing the RapidMiner Studio documentation for version 10.0 - Check here for latest version
Group Into Collection (Operator Toolbox)
Synopsis
This operator generates a collection of ExampleSets out of a single ExampleSet by performing a group by operation on the input ExampleSet. The "group by attribute" parameter can be specified. Each ExampleSet of the collection will contain all examples of one unique instance of the ''group by attribute'' parameter. The collection can be processed further by using operators that take collections as input, such as the 'Loop Collection' operator.Input
- exa (Data Table)
The input ExampleSet which should be grouped into a collection.
Output
- col (Collection)
The resulting collection of ExampleSets.
- org (Data Table)
The original ExampleSet.
Parameters
- group_by_attribute The name of the group by attribute. The resulting collection will contain one ExampleSet for each unique instance of the attribute. Range:
- sorting_order
The sorting order for the resulting collection.
- none: The collection is not ordered in any specific way.
- alphabetical: The collection is ordered due to the natural order of the string representation of the values of the ''group by attribute'' parameter. In short 0-9A-Za-z. Be aware that strings are compared character-by-character, e.g. ''A2'' comes before ''A11''.
- numerical: The collection is ordered due to the numerical values of the group by attribute in the original ExampleSet. Be aware that the group by attribute has to be a numerical one. A user error is thrown if otherwise.
- occurrence: The collection is ordered due to the occurrences of the values of the group by attribute in the original ExampleSet.
Tutorial Processes
Grouping an ExampleSet into a collection
In this tutorial we organize the Golf data set into a collection by grouping by the attribute ''Outlook''. For each ExampleSet in the collection the two examples with the highest temperatures are filtered. Both the collection and the filtered ExampleSet are delivered to the results panel.
Example Demonstration of the different sorting orders
In this tutorial we use the Group Into collection operator on the Titanic data set from the Samples folder. The operator is applied on the ExampleSet using the 'Age' attribute in four different ways. Each way of sorting is used once. The differences are explained in the comments of the process.